Weakly Supervised Action Labeling in Videos under Ordering Constraints
نویسندگان
چکیده
We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk” then “sit” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. We formulate the problem as a weakly supervised temporal assignment with ordering constraints. Each video clip is divided into small time intervals and each time interval of each video clip is assigned one action label, while respecting the order in which the action labels appear in the given annotations. We show that the action label assignment can be determined together with learning a classifier for each action in a discriminative manner. We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies.
منابع مشابه
Temporal Action Labeling using Action Sets
Action detection and temporal segmentation of actions in videos are topics of increasing interest. While fully supervised systems have gained much attention lately, full annotation of each action within the video is costly and impractical for large amounts of video data. Thus, weakly supervised action detection and temporal segmentation methods are of great importance. While most works in this ...
متن کاملFinding “It”: Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos
Grounding textual phrases in visual content with standalone image-sentence pairs is a challenging task. When we consider grounding in instructional videos, this problem becomes profoundly more complex: the latent temporal structure of instructional videos breaks independence assumptions and necessitates contextual understanding for resolving ambiguous visual-linguistic cues. Furthermore, dense ...
متن کاملAction Recognition by Weakly-Supervised Discriminative Region Localization
We present a novel probabilistic model for recognizing actions by identifying and extracting information from discriminative regions in videos. The model is trained in a weakly-supervised manner: training videos are annotated only with training label without any action location information within the video. Additionally, we eliminate the need for any pre-processing measures to help shortlist ca...
متن کاملSimilarity Constrained Latent Support Vector Machine: An Application to Weakly Supervised Action Classification
We present a novel algorithm for weakly supervised action classification in videos. We assume we are given training videos annotated only with action class labels. We learn a model that can classify unseen test videos, as well as localize a region of interest in the video that captures the discriminative essence of the action class. A novel Similarity Constrained Latent Support Vector Machine m...
متن کاملA Weakly-Supervised Approach to Seismic Structure Labeling
With the growing demand of high-resolution subsurface characterization from 3D seismic surveying, the size of 3D seismic data has been dramatically increasing, and correspondingly, the process of interpreting seismic data is becoming more time consuming and labor intensive. Recently, there has been great interest in various supervised machine learning techniques that can help reduce the time an...
متن کامل